- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources2
- Resource Type
-
0001000001000000
- More
- Availability
-
20
- Author / Contributor
- Filter by Author / Creator
-
-
Flemons, Paul (2)
-
Alzuru, Icaro (1)
-
Bibby, James (1)
-
Ceccaroni, Luigi (1)
-
Fagan, Laura (1)
-
Fortes, Jose A.B. (1)
-
Matsunaga, Andrea (1)
-
Michael, Katina (1)
-
Oliver, Jessica L. (1)
-
Roger, Erin (1)
-
Stephens, Rhiannon (1)
-
Tsugawa, Mauricio (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Information Extraction (IE) from imaged text is affected by the output quality of the text-recognition process. Misspelled or missing text may propagate errors or even preclude IE. Low confidence in automated methods is the reason why some IE projects rely exclusively on human work (crowdsourcing). That is the case of biological collections (biocollections), where the metadata (Darwin-core Terms) found in digitized labels are transcribed by citizen scientists. In this paper, we present an approach to reduce the number of crowdsourcing tasks required to obtain the transcription of the text found in biocollections' images. By using an ensemble of Optical Character Recognition (OCR) engines - OCRopus, Tesseract, and the Google Cloud OCR - our approach identifies the lines and characters that have a high probability of being correct. This reduces the need for crowdsourced transcription to be done for only low confidence fragments of text. The number of lines to transcribe is also reduced through hybrid human-machine crowdsourcing where the output of the ensemble of OCRs is used as the first "human" transcription of the redundant crowdsourcing process. Our approach was tested in six biocollections (2,966 images), reducing the number of crowdsourcing tasks by 76% (58% due to lines accepted by the ensemble of OCRs and about 18% due to accelerated convergence when using hybrid crowdsourcing). The automatically extracted text presented a character error rate of 0.001 (0.1%).more » « less
-
Ceccaroni, Luigi; Bibby, James; Roger, Erin; Flemons, Paul; Michael, Katina; Fagan, Laura; Oliver, Jessica L. (, Citizen Science: Theory and Practice)
An official website of the United States government
